Healthcare Price Transparency: Cutting Through the Noise¶

Analyzing Negotiated Insurance Rates Across Providers, Plans, and Procedures

Written by John Mahoney


A project to decode the Transparency in Coverage mandate and turn terabytes of raw data into usable insights for providers.

Background: When Transparency Creates Confusion¶

In theory, the Transparency in Coverage rule was a game changer: insurers are now required to publicly share every rate they've negotiated with providers — down to the CPT/HCPCS code, plan, and NPI level.

But in practice?

  • Insurers dumped terabytes of data across thousands of URLs
  • Files are broken into fragments with no consistent format
  • The same rate may be repeated dozens of times under different plan IDs or modifiers
  • Many files are so large they crash Excel, Notepad, or even cloud tools
  • Providers and analysts are left without tools to interpret the data

Instead of unlocking transparency, the system created a fog of data that only large tech teams can navigate.

This notebook demonstrates how to take that unstructured chaos and transform it into something clear, searchable, and useful.

Why This Data Matters to Providers¶

While the raw data may be messy, the insights it contains are powerful — especially for providers.

Here’s how providers can benefit by making sense of transparency data:

  • Benchmark Your Rates
    See how your negotiated rates compare to other providers in your region — by payer, CPT code, and facility type.

  • Negotiate Smarter Contracts
    Identify payers that consistently reimburse lower than average for key services and bring the data into contract negotiations.

  • Explore Expansion Opportunities
    Discover ZIP codes where reimbursements are high but provider competition is low — ideal for growth.

  • Optimize Service Mix
    Focus on services with favorable reimbursement trends in your area.

  • Gain Leverage with Payers
    Show evidence-based pricing comparisons to justify better rates or highlight inequities.

For the first time, providers have a window into the true negotiated landscape — no longer limited to chargemasters or anecdotal rates.

What This Notebook Contains¶

This notebook showcases real-world examples of how transparency data can be transformed into actionable insights for providers and healthcare analysts.


1. Procedure-Level Analytics by NPI¶

The first dashboard provides detailed analysis of a single procedure code, showing how one provider compares to nearby competitors:

  • Bell Curve of maximum negotiated rates with your rate highlighted
  • Top Providers by Max Rate
  • Rate Frequency Distribution
  • Rates vs Distance (scatter plot to show travel-adjusted value)

Useful for comparing your reimbursement rates against peers for specific CPT/HCPCS codes within a custom radius and by payer.


2. Facility-Level Market Analysis by Zip Code¶

The next dashboard helps analyze the competitive landscape based on facility type:

  • Negotiated Rates at Nearby Facilities, grouped by payer
  • Distance-Based Rate Trends
  • Bell Curve Comparison by Payer
  • Number of Providers in Area
  • Rates by Distance - Scatter Chart
  • Price Range of Competition by Distance
  • Interactive Map of Competitor Locations

Great for identifying new areas to expand your practice, scout for favorable rate zones, or analyze payer competition by geography.


Interactive Features¶

These dashboards are designed to allow filtering by:

  • Procedure Code
  • Taxonomy Code
  • Insurance Company
  • Zip Code and Radius
  • NPI (National Provider Identifier)

How the Data Was Processed¶

Transforming terabytes of payer data into usable insights requires more than just downloads — it requires smart filtering, deduplication, and architectural efficiency.

This notebook draws from a specialized import pipeline tailored to each insurance company’s unique quirks and formats.


Step-by-Step Breakdown¶

1. Custom Import Scripts by Payer¶

  • Each payer provides a unique JSON index file listing available data files.
  • My importer reads those indexes and downloads only files relevant to selected CPT/HCPCS codes.
  • Because each payer formats their data differently, custom rules and decoders are applied to extract consistent results.

2. Two-Stage Import Logic¶

  • Files are typically split into two sections:
    • Provider Groups
    • Negotiated Rates
  • The importer first scans for target procedure codes inside the Negotiated Rates and saves only the relevant rate entries and their associated provider group IDs.
  • In the second pass, the importer pulls only provider info for those specific group IDs — saving significant storage and time.

This smart filtering prevents loading all provider data unnecessarily — most providers are irrelevant to the procedure code being analyzed.

3. Parallelized Processing¶

  • The import runs on multiple hard drives simultaneously to speed up processing.
  • Final output is stored locally as cleaned JSON, not raw files.

4. Supporting Data¶

  • Downloaded NPI data from NPPES in order to get location and taxonomys of all NPIs in transparancy data.
  • Downloaded GEO codes via API from the census database to get Longitude and Lattitude for all Addresses and zipcodes
  • Downloaded census data to create a table of all zipcodes and counties in every state.

Database Architecture¶

Once all imports are complete:

  • All local JSON outputs are merged into a centralized database.
  • Data is organized into multiple normalized tables to handle:
    • Many-to-many links between rates and NPIs
    • Multiple modifiers and service codes per rate
    • A rate-to-provider link table for fast joins

Deduplication¶

  • Several queries run after initial load to remove redundant rates, NPIs, and plan IDs that are duplicated across files.

Scalable Structure (In Progress)¶

  • Final design will include:
    • A separate database per U.S. state
    • Partitioning by procedure code within each database
  • This allows for fast, cloud-based querying at scale and supports near-instant report generation for end users.
  • One payor in one state can have over 100 million rows and this is with less then 100 procedure codes being picked.

The architecture is built for performance — optimized to serve large analytics queries from cloud-based platforms in real time.

💻 Tech Stack + Notebook Usage Overview¶

This notebook is part of a larger healthcare analytics platform designed to analyze and visualize negotiated rate data with speed and clarity.

⚙️ Core Technologies¶

  • Python – backend processing and data transformations
  • Pandas / NumPy – efficient structured data manipulation
  • Plotly / Matplotlib / Seaborn – interactive and static visualizations
  • MySQL – main data store for millions of negotiated rates
  • Custom JSON Parsers – normalize and standardize payer-specific formats

🔒 Notebook Design (No Live Database)¶

Although the full system uses MySQL for data storage and querying, this notebook does not connect to a live database.

Instead, it uses pre-extracted and cleaned sample data to simulate what the live system would generate. This makes it ideal for:

  • Running in a local environment without credentials
  • Fast demonstrations or reviews
  • Avoiding cloud hosting costs during testing phase

💡 The database will only be cloud-deployed in the production environment to keep development costs minimal.


🧪 What You’ll See in This Notebook¶

This notebook walks through real-world usage scenarios by mimicking the functionality of the future web application. Here’s how:

  1. Procedure Code Selection

    • Example code: 90837
  2. Search Filters Applied

    • ZIP Code: 22192
    • Radius (miles): 50 miles
    • Taxonomy Code: multiple partial codes queired
    • Insurance Payer: BCBS and UHC
  3. Data Visualizations + Analytics

    • Procedure Rate - Bell Curve
    • Top Providers by Max Negotiated Rate - Column Chart
    • Rate Frequency Distribution - Column Chart
    • Rate vs Distance from Input provider's zip - Scatter Chart
    • Payer Comparison: Negotiated Rate Spread - Column Chart
    • Rate vs Distance from ZIP Code - Line Chart
    • Overlapping Bell Curves by Insurance Payor - Bell Chart
    • Nearby Facilities by Taxonomy Code (Flexible Matching) - Column Chart
    • Rate Behavior by Distance from Input ZIP Code - Scatter Chart
    • Average Max and Min Rates by Distance - Line Chart
    • Geographic Map of Competing Providers - Video demo of mapped providors.

📊 All charts and outputs below reflect what would be dynamically generated in the live platform — just using offline data for now.

You can scroll through each section without needing to configure anything — it’s fully self-contained.

In [3]:
pip install plotly
Requirement already satisfied: plotly in i:\python\lib\site-packages (6.0.1)
Requirement already satisfied: narwhals>=1.15.1 in i:\python\lib\site-packages (from plotly) (1.34.1)
Requirement already satisfied: packaging in i:\python\lib\site-packages (from plotly) (24.2)
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
In [4]:
pip install seaborn
Requirement already satisfied: seaborn in i:\python\lib\site-packages (0.13.2)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in i:\python\lib\site-packages (from seaborn) (2.2.2)
Requirement already satisfied: pandas>=1.2 in i:\python\lib\site-packages (from seaborn) (2.2.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in i:\python\lib\site-packages (from seaborn) (3.10.0)
Requirement already satisfied: contourpy>=1.0.1 in i:\python\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.1)
Requirement already satisfied: cycler>=0.10 in i:\python\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in i:\python\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.55.5)
Requirement already satisfied: kiwisolver>=1.3.1 in i:\python\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.8)
Requirement already satisfied: packaging>=20.0 in i:\python\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (24.2)
Requirement already satisfied: pillow>=8 in i:\python\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in i:\python\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.1)
Requirement already satisfied: python-dateutil>=2.7 in i:\python\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in i:\python\lib\site-packages (from pandas>=1.2->seaborn) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in i:\python\lib\site-packages (from pandas>=1.2->seaborn) (2025.1)
Requirement already satisfied: six>=1.5 in i:\python\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
In [5]:
from IPython.display import HTML

HTML("""
<video width="1000" controls>
  <source src="export/chrome_C6ooVNkkFY.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")
Out[5]:
Your browser does not support the video tag.
In [6]:
import pandas as pd
import json
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

#Some of the transparancy data has outliers which have to be ignored so the data isn't skewed.
def remove_outliers_iqr(df, column):
    """
    Removes outliers from the DataFrame using the IQR method on a specific column.

    Parameters:
        df (pd.DataFrame): The input DataFrame.
        column (str): Name of the numeric column to evaluate for outliers.

    Returns:
        pd.DataFrame: Filtered DataFrame with outliers removed.
    """
    q1 = df[column].quantile(0.25)
    q3 = df[column].quantile(0.75)
    iqr = q3 - q1

    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    #only throw out the outliers if the first and third quartiles have a difference more then 100.
    if q3-q1 > 100:
        return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]
    else:
        return df


# load all the pre queried data
df_npi_compared = remove_outliers_iqr(pd.read_json("export/results3.json"),'max_rate')
df_uhc_raw = pd.read_json("export/finalresults3.json")
df_bcbs_raw = pd.read_json("export/finalresults9.json")

# remove outliers from all
df_uhc = remove_outliers_iqr(df_uhc_raw, 'max_rate')
df_uhc = remove_outliers_iqr(df_uhc, 'min_rate')
df_bcbs = remove_outliers_iqr(df_bcbs_raw, 'max_rate')
df_bcbs = remove_outliers_iqr(df_bcbs, 'min_rate')



# load more pre queried data that holds data used for the map visual
with open("export/BCBSrates.json", "r") as f:
    raw_data = json.load(f)

# Flatten the structure
rows = []
for npi, entries in raw_data.items():
    for entry in entries:
        rows.append({
            "npi": npi,
            "rate": float(entry["rate"]),
            "distance": entry["distance"]
        })

# Convert to DataFrame
df_bcbs2 = pd.DataFrame(rows)



with open("export/UHCrates.json", "r") as f:
    raw_data = json.load(f)

# Flatten the structure
rows = []
for npi, entries in raw_data.items():
    for entry in entries:
        rows.append({
            "npi": npi,
            "rate": float(entry["rate"]),
            "distance": entry["distance"]
        })

# Convert to DataFrame
df_uhc2 = pd.DataFrame(rows)



#This data holds the rate for given NPI loaded. This would be the user's NPI information
with open("export/myMaxRate.json", "r") as f:
    my_max = float(json.load(f))

df_npi_compared.head()
Out[6]:
npi legal_business_name billing_code max_rate longitude latitude tax1 tax2 otherName firstName longZ latZ distance
0 1255300471 INSIGHT HEALTH CORP 90837 204.67 NaN NaN 261QR0200X -77.314673 38.683699 0.0
1 1356779599 COMPREHENSIVE SLEEP CARE CENTER INC 90837 290.09 -77.355383 38.678935 207RS0012X -77.314673 38.683699 0.0
2 1366591562 GREATER PRINCE WILLIAM AREA COMMUNITY HEALTH C... 90837 236.25 -77.314055 38.686735 261QF0400X -77.314673 38.683699 0.0
3 1417112236 NORTH VIRGINIA FAMILY PRACTICE PC 90837 190.27 -77.340124 38.673827 207Q00000X -77.314673 38.683699 0.0
4 1548739170 SAVIDA HEALTH, PC 90837 253.90 -77.333383 38.667565 207RA0401X 2084P0800X -77.314673 38.683699 0.0

Procedure Rate Bell Curve¶

This chart shows the distribution of negotiated rates for a selected procedure code across all providers within a given radius and payor.

The provider's own rate (based on the entered NPI) is highlighted, allowing for quick visual comparison against competitors.

This is especially useful in identifying whether a provider is:

  • Below market
  • At the average
  • Significantly above peers
In [7]:
import matplotlib.pyplot as plt
import numpy as np

# Data prep
rates = df_npi_compared["max_rate"].dropna()
mean = rates.mean()
std_dev = rates.std()
minx = rates.min()
maxx = rates.max()

x = np.linspace(minx, maxx, 2000)
pdf = (1 / (std_dev * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / std_dev)**2)

# Plot setup
plt.figure(figsize=(12, 6))
plt.plot(x, pdf, label="Bell Curve", color="#008080", linewidth=2)
plt.fill_between(x, pdf, alpha=0.15, color="#008080")

# Annotation: Draws a vertical line on Bell curve showing the input NPI's rate to be used as a comparison
plt.axvline(my_max, color='crimson', linestyle='-', linewidth=2, label='Your Rate')
plt.text(my_max, max(pdf)*0.92, f'Your Rate\n${my_max:.2f}', color='crimson',
         fontsize=10, ha='center', va='bottom', weight='bold',
         bbox=dict(facecolor='white', edgecolor='crimson', boxstyle='round,pad=0.3'))

# Annotation:  Draw a vertical line at the 84th Percentile
eightyfourth = mean + std_dev
plt.axvline(eightyfourth, color='royalblue', linestyle='--', linewidth=2, label='84th Percentile')
plt.text(eightyfourth, max(pdf)*0.85, f'84th Percentile\n${eightyfourth:.2f}', color='royalblue',
         fontsize=10, ha='center', va='bottom', weight='bold',
         bbox=dict(facecolor='white', edgecolor='royalblue', boxstyle='round,pad=0.3'))

# Label and Style bell curve
plt.title("Distribution of Negotiated Rates", fontsize=14, weight='bold')
plt.xlabel("Negotiated Rate ($)", fontsize=12)
plt.ylabel("Probability Density", fontsize=12)
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.grid(alpha=0.3)
plt.legend(frameon=True, fontsize=10)
plt.tight_layout()
plt.show()
No description has been provided for this image

Top Providers by Max Negotiated Rate¶

This chart displays the providers receiving the highest negotiated rates for the selected procedure code under the chosen payer.

This allows for:

  • Benchmarking against top earners
  • Identifying outliers in reimbursement trends
  • Understanding potential rate disparities across organizations
In [8]:
#This plot will create a column chart with the top 19 facilities by Max_Rate with the input(user's) facility 1st.  
import plotly.express as px
# Sort by distance and keep top 20
df_top20 = df_npi_compared.sort_values("max_rate", ascending=False).head(20)

# 2. Insert your facility as first row
my_row = pd.DataFrame([{
    'legal_business_name': 'My Rate',
    'max_rate': my_max
}])

# 3. Combine your row + competitors
df_chart = pd.concat([my_row, df_top20], ignore_index=True)
df_chart = df_chart.drop_duplicates(subset="legal_business_name")



fig = px.bar(
    df_chart,
    x='legal_business_name',  
    y='max_rate',
    title="Pricing Comparison (10 Closest Facilities)",
    labels={
        'provider_name': 'Provider',
        'max_rate': 'Negotiated Rate ($)'
    },
    color_discrete_sequence=['#1f77b4']
)

fig.update_layout(
    xaxis_tickangle=40,
    showlegend=False,
    title_font=dict(size=18),
    margin=dict(t=60, l=40, r=40, b=100),
    xaxis_tickfont=dict(size=10),
    yaxis_tickformat='$,.2f'
)

fig.show()

Rate Frequency Distribution¶

This bar chart shows how often each negotiated rate appears among providers in the filtered area.

By viewing this, providers can better understand:

  • How common their reimbursement rate is
  • Where the clustering of rates occurs
  • How many unique rates exist for this procedure within the payer's network
In [9]:
#This section will create a column chart showing the frequency of different Max_Rates for all facilities 
#in the retrieved data.  This provides a quick visual on the trends for pricing for the given area and procedure code.
rate_counts = df_npi_compared["max_rate"].dropna().round(0).value_counts().sort_index()
df_rate_counts = rate_counts.reset_index()
df_rate_counts.columns = ['negotiated_rate', 'frequency']

df_rate_counts.head()
fig1 = px.bar(
    df_rate_counts,
    x='negotiated_rate',
    y='frequency',
    labels={
        'negotiated_rate': 'Negotiated Rate ($)',
        'frequency': 'Frequency'
    },
    title="Competitor Rate Distribution",
    color_discrete_sequence=['#1f2a44']  # Dark navy blue
)

fig1.update_layout(xaxis_tickangle=-45)
fig1.show()

Rate vs Distance from Input provider's zip¶

Each point represents a provider, plotting their negotiated rate against their distance from the selected ZIP code. Useful for understanding geographic pricing patterns and travel-based trade-offs.

In [10]:
#Scatter plot to show all Max_Rates for all facilies in data by distance from user's zipcode
fig = px.scatter(
    df_npi_compared,
    x='distance',
    y='max_rate',
    title='Rates vs Distance from Facility',
    labels={
        'distance': 'Distance (miles)',
        'max_rate': 'Negotiated Rate ($)'
    },
    opacity=0.7
)

fig.update_traces(marker=dict(size=6, color='darkblue'))  # Adjust dot size/color
fig.update_layout(
    title_font=dict(size=18),
    xaxis=dict(tickformat=".1f"),
    yaxis=dict(tickformat="$,.2f"),
    plot_bgcolor='white'
)

fig.show()
In [19]:
from IPython.display import display, HTML

# Limit the number of rows if needed (optional)
html_table = df_npi_compared.to_html(max_rows=50)

# Display the table in a scrollable div
display(HTML(f"""
<div style="height:300px; overflow:auto; border:1px solid #ccc; padding:10px">
    {html_table}
</div>
"""))
npi legal_business_name billing_code max_rate longitude latitude tax1 tax2 otherName firstName longZ latZ distance
0 1255300471 INSIGHT HEALTH CORP 90837 204.67 NaN NaN 261QR0200X -77.314673 38.683699 0.000000
1 1356779599 COMPREHENSIVE SLEEP CARE CENTER INC 90837 290.09 -77.355383 38.678935 207RS0012X -77.314673 38.683699 0.000000
2 1366591562 GREATER PRINCE WILLIAM AREA COMMUNITY HEALTH CENTER INC 90837 236.25 -77.314055 38.686735 261QF0400X -77.314673 38.683699 0.000000
3 1417112236 NORTH VIRGINIA FAMILY PRACTICE PC 90837 190.27 -77.340124 38.673827 207Q00000X -77.314673 38.683699 0.000000
4 1548739170 SAVIDA HEALTH, PC 90837 253.90 -77.333383 38.667565 207RA0401X 2084P0800X -77.314673 38.683699 0.000000
5 1417355033 US PRIMARY AND URGENT CARE INC 90837 276.69 -77.257429 38.674802 261QP2300X -77.314673 38.683699 0.000000
6 1528627734 RADIOLOGY INSTITUTE 90837 276.69 NaN NaN 2085R0202X 261QR0200X -77.314673 38.683699 0.000000
7 1689965865 PRINCE WILLIAM PEDIATRICS AND ADOLESCENT CENTER 90837 276.69 NaN NaN 261QP2300X -77.314673 38.683699 0.000000
8 1790487486 HARRIET V GREENFIELD NP, LLC 90837 276.69 NaN NaN 261QP2300X -77.314673 38.683699 0.000000
9 1962103747 OPTIMAL SLEEP AND WEIGHT LOSS CLINIC, PLLC 90837 276.69 NaN NaN 207RS0012X 261QS1200X -77.314673 38.683699 0.000000
10 1831466440 NMG AFFILIATE PRACTICE I, LLC 90837 205.87 NaN NaN 208600000X -77.314673 38.683699 0.000000
11 1407162076 CAPITAL HEALTHCARE PC DBA ALLCARE 90837 347.65 -77.333497 38.674425 207Q00000X 207R00000X -77.314673 38.683699 0.000000
12 1003129826 BIOSERENITY DT INC. 90837 188.90 -77.304570 38.657435 261QS1200X -77.314673 38.683699 0.000000
13 1013402593 MEDNOVATIONS LLC 90837 289.32 -77.265900 38.675804 261QP2300X -77.314673 38.683699 0.000000
14 1699196246 HEART AND VASCULAR INSTITUTE OF VIRGINIA, INC 90837 289.32 NaN NaN 207RC0000X -77.314673 38.683699 0.000000
15 1790104487 H&B CARE PLLC 90837 289.32 NaN NaN 261QP2300X -77.314673 38.683699 0.000000
16 1831620251 CAVERY WELLNESS OF WOODBRIDGE 90837 289.32 NaN NaN 207QB0002X -77.314673 38.683699 0.000000
17 1881015436 VISTA HEART & VASCULAR, PLLC 90837 289.32 NaN NaN 207RC0000X -77.314673 38.683699 0.000000
18 1992137590 MAGNUS HEART AND VASCULAR, INC. 90837 289.32 NaN NaN 207RC0000X 207UN0901X -77.314673 38.683699 0.000000
19 1700441326 CAVERY HEALTH CLINIC 90837 155.08 NaN NaN 207Q00000X -77.314673 38.683699 0.000000
20 1932632858 COMMUNITY HEALTH SERVICES, INC. 90837 155.08 NaN NaN 207Q00000X 207Q00000X -77.314673 38.683699 0.000000
22 1174829832 PRINCE WILLIAM NEUROLOGY CENTER INC 90837 159.92 NaN NaN 2084N0400X -77.352000 38.644195 3.392152
23 1952463606 DOMINION FAMILY HEALTH, PLC 90837 275.58 NaN NaN 207Q00000X -77.352000 38.644195 3.392152
24 1134460876 EASY CARE URGENT CARE 90837 441.21 -77.368489 38.660417 207Q00000X -77.352000 38.644195 3.392152
25 1740443183 MILLENIUM MEDICAL CORP 90837 398.82 NaN NaN 207Q00000X 207R00000X -77.352000 38.644195 3.392152
... ... ... ... ... ... ... ... ... ... ... ... ... ...
1956 1881913275 GEORGE KURIAN, M.D. 90837 372.03 NaN NaN 207RG0100X -76.629444 39.161942 49.486093
1957 1225227846 SAWHNEY NEUROLOGY PA 90837 473.29 NaN NaN 2084N0400X 208VP0000X -76.629444 39.161942 49.486093
1958 1750562898 AMBI PHYSICIAN PC 90837 473.29 NaN NaN 261QP2300X -76.629444 39.161942 49.486093
1959 1972557304 BALTIMORE VAMC 90837 170.96 NaN NaN 261QV0200X -76.629444 39.161942 49.486093
1960 1710071725 BALTIMORE WASHINGTON PROFESSIONAL SERVICES INC. 90837 260.63 NaN NaN 208600000X -76.629444 39.161942 49.486093
1961 1821277856 BALTIMORE WASHINGTON PROFESSIONAL SERVICES INC. 90837 260.63 NaN NaN 207RH0003X 207VG0400X -76.629444 39.161942 49.486093
1962 1972084507 UNIVERSITY OF MARYLAND BALTIMORE WASHINGTON HEART ASSOCIATES LLC 90837 260.63 NaN NaN 207RC0000X -76.629444 39.161942 49.486093
1963 1164436747 NORTH ARUNDEL CARDIOLOGY SERVICES, LLC 90837 191.50 NaN NaN 207RC0000X -76.629444 39.161942 49.486093
1964 1427130905 CRAIN MEDICAL P A 90837 235.20 -76.631716 39.144373 207R00000X -76.629444 39.161942 49.486093
1965 1275560732 YOUN AND MANNAN LLC 90837 169.76 NaN NaN 208000000X -76.629444 39.161942 49.486093
1966 1457685380 BHAWNA BAHETHI, MD LLC 90837 169.76 NaN NaN 207R00000X -76.629444 39.161942 49.486093
1967 1164656427 BIOSERENITY DT, INC. 90837 296.26 NaN NaN 261QS1200X -76.629444 39.161942 49.486093
1968 1477741049 PETER P RAMIREZ MD PA 90837 362.16 -76.618373 39.141100 207R00000X -76.629444 39.161942 49.486093
1969 1215929369 THE HEART CENTER OF NORTHERN ANNE ARUNDEL COUNTY PA 90837 355.93 NaN NaN 207RC0000X 207RC0001X -76.629444 39.161942 49.486093
1970 1124016696 BALTIMORE WASHINGTON MEDICAL CENTER INC. 90837 174.79 NaN NaN 207RC0200X 207RE0101X -76.629444 39.161942 49.486093
1971 1760694624 LUNG ASSOCIATES OF ANNE ARUNDEL PC 90837 176.48 NaN NaN 207RC0200X 207RI0200X -76.629444 39.161942 49.486093
1972 1538487582 CENTER FOR PAIN MANAGEMENT, LLC 90837 275.59 -76.631716 39.144373 208VP0014X -76.629444 39.161942 49.486093
1973 1770741753 CENTER FOR VEIN RESTORATION MD LLC 90837 185.23 NaN NaN 2086S0129X -76.629444 39.161942 49.486093
1974 1366435596 ARUNDEL MEDICAL GROUP INC 90837 254.65 -76.618409 39.158111 111N00000X 207Q00000X -76.629444 39.161942 49.486093
1975 1760415186 CHESAPEAKE OPEN MRI 90837 162.02 NaN NaN 2085R0202X -76.629444 39.161942 49.486093
1976 1417077660 MARYLAND HEART ASSOCIATESLLC 90837 145.74 -76.631830 39.149975 207RC0000X -76.629444 39.161942 49.486093
1977 1679507297 MARYLAND PULMONARY & CRITICAL CARE GROUP, P.A. 90837 221.58 NaN NaN 207RC0200X 207RP1001X -76.629444 39.161942 49.486093
1978 1851506232 UHA HARPERS FERRY FAM MED (PAASGRP) 90837 287.32 NaN NaN 207Q00000X -77.773612 39.306168 49.571924
1979 1003367566 ROBERT MARRIOTT MEDICAL CORP 90837 464.94 -77.153184 39.369478 208600000X -77.163984 39.394652 49.786361
1980 1972250330 SHAFI INC 90837 240.50 NaN NaN 207Q00000X -77.163984 39.394652 49.786361

Regional Scouting Report Based on ZIP Code Input¶

This section provides a detailed visual and data-driven overview of facilities and negotiated rates surrounding the selected ZIP code.

Think of it as a scouting report:
It helps evaluate the local reimbursement landscape, understand competitor positioning, and identify potential opportunities for network expansion or provider placement.

Charts and tables below highlight:

  • Provider rates within proximity
  • Rate distribution and patterns
  • Comparisons to your facility’s rate
  • Geographic trends by distance
In [12]:
from IPython.display import HTML

HTML("""
<video width="1000" controls>
  <source src="export/chrome_xJd8e4pcSJ.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")
Out[12]:
Your browser does not support the video tag.

Payer Comparison: Negotiated Rate Spread¶

This chart compares how different payers reimburse for the selected procedure code. It highlights payer-specific rate variability and can assist in contract negotiation or payer prioritization.

In [13]:
#Clustered column chart showing side by side UHC and BCBS rates sorted by distance from input zipcode.
import plotly.graph_objects as go

# Trim to first 30 for consistent labels
top_n = 30
df_bcbs.head()
df_bcbs_top = df_bcbs[['legal_business_name', 'max_rate']].dropna().head(top_n)
df_uhc_top = df_uhc[['legal_business_name', 'max_rate']].dropna().head(top_n)

# Trim provider names for display
df_bcbs_top['short_name'] = df_bcbs_top['legal_business_name'].str[:15]
df_uhc_top['short_name'] = df_uhc_top['legal_business_name'].str[:15]


# Join on position index to align the same rows
df_combined = pd.DataFrame({
    'Provider': df_bcbs_top['short_name'].reset_index(drop=True),
    'BCBS': df_bcbs_top['max_rate'].reset_index(drop=True),
    'UHC': df_uhc_top['max_rate'].reset_index(drop=True)
})

df_combined.head()




fig = go.Figure(data=[
    go.Bar(name='BCBS', x=df_combined['Provider'], y=df_combined['BCBS'], marker_color='#2196f3'),
    go.Bar(name='UHC', x=df_combined['Provider'], y=df_combined['UHC'], marker_color='#1976d2')
    # Add Aetna here if needed
])

fig.update_layout(
    barmode='group',
    title='Price Distribution by Competitor',
    xaxis_title='Competitors',
    yaxis_title='Max Rate ($)',
    xaxis_tickangle=70,
    xaxis_tickfont=dict(size=10),
    height=500
)

fig.show()

Rate vs Distance from ZIP Code¶

This scatter plot shows each provider's negotiated rate as a function of their distance from the input ZIP code.

It helps evaluate:

  • Whether higher rates are found closer or farther from the patient base
  • If travel time trade-offs might justify pricing differences
  • Geographic reimbursement trends across the region
In [14]:
#Line chart to show the average rates by distance.  2 lines will shown for BCBS and UHC

bins = list(range(0, 60, 5))  
labels = [f"{i}-{i+5}" for i in bins[:-1]]  

# Ensure clean copies before assignment
df_bcbs = df_bcbs.copy()
df_uhc = df_uhc.copy()

# BCBS processing
df_bcbs['distance_group'] = pd.cut(df_bcbs['distance'], bins=bins, labels=labels, right=False)
df_grouped = df_bcbs.groupby('distance_group', observed=True)['max_rate'].mean().reset_index()
df_grouped = df_grouped.dropna()
df_grouped['payer'] = 'BCBS'

# UHC processing
df_uhc['distance_group'] = pd.cut(df_uhc['distance'], bins=bins, labels=labels, right=False)
df_uhc_grouped = df_uhc.groupby('distance_group', observed=True)['max_rate'].mean().reset_index()
df_uhc_grouped = df_uhc_grouped.dropna()
df_uhc_grouped['payer'] = 'UHC'

# Combine and plot
df_combined = pd.concat([df_grouped, df_uhc_grouped])

import plotly.express as px

fig = px.line(
    df_combined,
    x='distance_group',
    y='max_rate',
    color='payer',
    markers=True,
    title='Average Max Rate by Distance Group (BCBS vs UHC)',
    labels={
        'distance_group': 'Distance (miles)',
        'max_rate': 'Average Max Rate ($)',
        'payer': 'Payer'
    }
)

fig.update_layout(
    yaxis=dict(tickformat='$,.2f'),
    plot_bgcolor='white',
    title_font=dict(size=18)
)

fig.show()

Overlapping Bell Curves by Insurance Payor¶

The chart below visualizes normalized bell curves representing the distribution of negotiated rates for different insurance payors in the selected area.

Each curve is generated using the mean and standard deviation of rates associated with a specific payer, then normalized to allow easy comparison regardless of absolute volume.

This visualization helps answer:

  • Which payors generally reimburse more or less?
  • How consistent or variable are the rates?
  • Is one payer more volatile or tightly clustered?

By overlapping the curves, you can quickly see how payor-specific reimbursement behavior compares across the local market.

In [15]:
#Overlapping bell curves will be generated for visualizing std_dev for both BCBS and UHC rates.
def generate_bell_curve(mean, std_dev, color, label, num_points=200):
    x = np.linspace(max(mean - 3*std_dev, 0), mean + 3*std_dev, num_points)
    pdf = (1 / (std_dev * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / std_dev) ** 2)
    pdf /= pdf.max()  # Normalize height to 1

    return go.Scatter(
        x=x,
        y=pdf,
        mode='lines',
        fill='tozeroy',
        name=f"{label} (μ={mean:.2f}, σ={std_dev:.2f})",
        line=dict(color=color, width=2),
        opacity=0.6
    )


# Clean and convert to float
bcbs_rates = df_bcbs['max_rate'].dropna().astype(float)
uhc_rates = df_uhc['max_rate'].dropna().astype(float)

# Stats
bcbs_mean = bcbs_rates.mean()
bcbs_std = bcbs_rates.std()

uhc_mean = uhc_rates.mean()
uhc_std = uhc_rates.std()

# Generate bell curve traces
bcbs_curve = generate_bell_curve(bcbs_mean, bcbs_std, 'orange', 'BCBS')
uhc_curve = generate_bell_curve(uhc_mean, uhc_std, 'blue', 'UHC')

fig = go.Figure([bcbs_curve, uhc_curve])

fig.update_layout(
    title="Negotiated Rate Distribution by Payor (Normalized Bell Curves)",
    xaxis_title="Negotiated Rate ($)",
    yaxis_title="Normalized Density",
    plot_bgcolor='white',
    title_font=dict(size=18),
    legend=dict(font=dict(size=12))
)

fig.show()

Nearby Facilities by Taxonomy Code (Flexible Matching)¶

This section displays facilities within a selected radius that match one or more taxonomy codes provided by the user. The input supports both:

  • Full taxonomy codes (e.g., 207Q00000X)
  • Partial matches using the first few characters (e.g., 207, 261)

This flexibility allows users to:

  • Explore specific specialties or broader provider categories
  • Analyze competitive saturation in nearby areas
  • Identify facility clusters or gaps in care coverage

The output below helps assess the density and distribution of similar providers based on taxonomy scope defined by the user.

In [16]:
#Column Chart showing number of facilities within the given distance from input zipcode.
# Step 1: Create the upper bounds for your bins (0–5, 0–10, ..., 0–40)
bin_edges = list(range(5, 45, 5))  # [5, 10, 15, ..., 40]

# Step 2: Compute cumulative counts
cumulative_data = []
for edge in bin_edges:
    count = (df_uhc['distance'] < edge).sum()
    cumulative_data.append({'Distance (miles)': f"0-{edge}", 'Number of Facilities': count})

# Step 3: Convert to DataFrame
df_cumulative = pd.DataFrame(cumulative_data)

fig = px.bar(
    df_cumulative,
    x='Distance (miles)',
    y='Number of Facilities',
    title='Number of Facilities within Certain Distances',
    text='Number of Facilities',
    labels={'Distance (miles)': 'Distance (miles)', 'Number of Facilities': 'Facility Count'},
    color_discrete_sequence=['#1f77b4']
)

fig.update_traces(
    textposition='outside',
    marker_line_color='black',
    marker_line_width=1.2
)

fig.update_layout(
    yaxis=dict(title='Number of Facilities', rangemode='tozero'),
    xaxis=dict(title='Distance (miles)'),
    plot_bgcolor='white',
    uniformtext_minsize=8,
    title_font=dict(size=18)
)

fig.show()

Rate Behavior by Distance from Input ZIP Code¶

The charts below explore how negotiated insurance rates change as the distance from the selected ZIP code increases.

Max Negotiated Rates by Distance (Scatter Plot)¶

Each dot represents the maximum negotiated rate for a facility, plotted against its distance from the input ZIP code. This provides insight into:

  • The spread of rate variability at different distances
  • Potential regional rate clustering
  • How far a provider might go to find better or worse payer contracts

This is useful for providers evaluating regional competitiveness or looking to benchmark themselves in a broader market.

In [17]:
#Scatter chart showing all of the rates for UHC and BCBS by distance.
# Make sure both dataframes have correct data types
df_bcbs = df_bcbs.copy()
df_uhc = df_uhc.copy()

# Now safe to convert column type
df_bcbs['max_rate'] = df_bcbs['max_rate'].astype(float)
df_uhc['max_rate'] = df_uhc['max_rate'].astype(float)


# Create scatter traces for each payer
trace_bcbs = go.Scatter(
    x=df_bcbs['distance'],
    y=df_bcbs['max_rate'],
    mode='markers',
    name='BCBS Rates',
    marker=dict(color='rgba(255, 87, 34, 0.6)', size=5, line=dict(width=1, color='rgba(255, 87, 34, 1)'))
)

trace_uhc = go.Scatter(
    x=df_uhc['distance'],
    y=df_uhc['max_rate'],
    mode='markers',
    name='UHC Rates',
    marker=dict(color='rgba(33, 150, 243, 0.6)', size=5, line=dict(width=1, color='rgba(33, 150, 243, 1)'))
)

# Combine traces in a figure
fig = go.Figure(data=[trace_uhc, trace_bcbs])

# Update layout
fig.update_layout(
    title="Max Negotiated Rates by Distance",
    xaxis_title="Distance (miles)",
    yaxis_title="Payment ($)",
    plot_bgcolor='white',
    legend=dict(font=dict(size=12)),
    title_font=dict(size=18)
)

fig.show()

Average Max and Min Rates by Distance (Line Chart)¶

This chart shows the average of the highest and lowest negotiated rates (across multiple payers such as BCBS, UHC, Aetna) at various distance intervals.

It highlights:

  • The pricing ceiling (average max rate) and floor (average min rate)
  • How the overall pricing landscape shifts geographically
  • Useful context for expansion decisions or payer negotiation strategy
In [18]:
# Line chart that will display the average max_rate and min_rate for UHC and BCBS combined by distance from input zipcode.
import pandas as pd
import plotly.graph_objects as go

# Assume df_combined has distance, max_rate, and min_rate columns (combined UHC + BCBS + AETNA if needed)
df_combined = pd.concat([df_uhc, df_bcbs], ignore_index=True)

# Define distance bins (you can adjust to suit your use case)
bins = list(range(0, 50, 5))  # [0–5, 5–10, ..., 45–50]
labels = [f"{i}-{i+5}" for i in bins[:-1]]

# Group by distance range
df_combined['distance_group'] = pd.cut(df_combined['distance'], bins=bins, labels=labels, right=False)

# Group and calculate mean for both max and min rates
df_avg = df_combined.groupby('distance_group', observed=True)[['max_rate', 'min_rate']].mean().reset_index()
df_avg = df_avg.dropna()

fig = go.Figure()

# Average Max Rate (royal blue)
fig.add_trace(go.Scatter(
    x=df_avg['distance_group'],
    y=df_avg['max_rate'],
    mode='lines+markers',
    name='Average Max Rate',
    line=dict(color='royalblue', width=3)
))

# Average Min Rate (solid red)
fig.add_trace(go.Scatter(
    x=df_avg['distance_group'],
    y=df_avg['min_rate'],
    mode='lines+markers',
    name='Average Min Rate',
    line=dict(color='red', width=3)
))

# Layout
fig.update_layout(
    title='Price Range of Competitors by Distance',
    xaxis_title='Distance (miles)',
    yaxis_title='Average Rate ($)',
    plot_bgcolor='white',
    legend=dict(font=dict(size=12)),
    title_font=dict(size=18),
    yaxis_tickformat='$,.2f'
)

fig.show()

Geographic Map of Competing Providers¶

This interactive map shows nearby facilities matching the selected taxonomy code, within the given search radius.

  • Hovering over a point shows that facility’s name and negotiated rate
  • Ideal for scouting competition, locating underserved zones, or identifying strong payers in nearby ZIP codes
In [18]:
from IPython.display import HTML

HTML("""
<video width=100% controls>
  <source src="export/input_demo_reencoded.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>
""")
Out[18]:
Your browser does not support the video tag.